An improved equilibrium optimization algorithm for feature selection problem in network intrusion detection

In this paper, an enhanced equilibrium optimization (EO) version named Levy-opposition-equilibrium optimization (LOEO) is proposed to select effective features in network intrusion detection systems (IDSs). The opposition-based learning (OBL) approach is applied by this algorithm to improve the diversity of the population. Also, the Levy flight method is utilized to escape local optima. Then, the binary rendition of the algorithm called BLOEO is employed to feature selection in IDSs. One of the main challenges in IDSs is the high-dimensional feature space, with many irrelevant or redundant features. The BLOEO algorithm is designed to intelligently select the most informative subset of features. The empirical findings on NSL-KDD, UNSW-NB15, and CIC-IDS2017 datasets demonstrate the effectiveness of the BLOEO algorithm. This algorithm has an acceptable ability to effectively reduce the number of data features, maintaining a high intrusion detection accuracy of over 95%. Specifically, on the UNSW-NB15 dataset, BLOEO selected only 10.8 features on average, achieving an accuracy of 97.6% and a precision of 100%.

• Presenting a novel feature selection method, employing an improved binary EO.
• Improving the diversity of individuals in the population and improving the exploration phase of the EO algo- rithm by using Opposition-based Learning and employing the levy flight to escape from the local optimum.• Detection of network intrusions by selecting optimal features, and proposing BLOEO algorithm.
• Evaluating the efficiency of BLOEO using NSL-KDD, UNSW-NB15, and CIC-IDS2017 datasets and compar- ing the test findings with other metaheuristic algorithms concerning accuracy, recall, specificity, precision, and F-Score.
This paper is organized as follows: Section "Related work" gives a brief review of the related works.Section "Equilibrium optimizer" outlines the standard EO algorithm.Section "Proposed algorithm" includes the details of the proposed algorithm, LOEO.The simulation and results of intrusion detection datasets in the feature selection issue are provided in Sect."Experimental results".Finally, Sect."Conclusion and future works" contains the conclusions and future direction of the study.

Related work
The issue of network security has become increasingly important as computer networks are being used in various fields.An intrusion detection system's objective is to determine and avoid unauthorized entry into the system.However, the existence of a vast number of features in IDSs poses a challenge.To address this challenge, researchers have proposed multiple feature selection algorithms for IDSs.These algorithms aim to identify the most useful and effective features from the data to enhance the accuracy and efficiency of the IDS.
ZHAO et al. 32 introduced a new IDS that combines feature selection with a weighted stacking classifier named CFS-DE, to constrain the dimension of the features, and enhance the classification performance.CFS-DE is used to search for the most suitable set of features, Meanwhile, the weighted stacking algorithm improves the base classifier weights that exhibit favorable training results and reduces the weights of those with unfavorable results.The system aims to enhance the efficiency of intrusion detection by decreasing the dimension of features and enhancing the accuracy of the classification.Hajisalem and Babaei 33 proposed a novel hybrid classification approach that integrates two optimization algorithms ABC and AFS.The approach incorporates Correlationbased Feature Selection and Fuzzy C-means clustering methods to partition the training dataset and eliminate irrelevant features.To differentiate between normal and anomalous records, their method uses the CART to build If-Then rules based on the selected attributes.Asghari Varzaneh et al. 34 introduced a fuzzy rule-based classification framework to detect intrusions within computer network environments.To bolster the classification efficacy, the researchers devised a novel technique relying on Genetic Algorithms (GA) to optimize the rule weighting scheme.The proposed methodology was validated using the benchmark KDD99 dataset, and the experimental findings indicate that it significantly improves the detection accuracy and reduces the false alarm rate of the fuzzy rule-based classification system.Samadi Bonab et al. 35 introduced a method to detect the most important features for constructing an IDS and proposed a new hybrid method based on FFA and ALO optimization algorithms to identify the optimal features and improve the performance of IDS.The proposed method is intended to enhance the effectiveness of IDS by identifying important features from a high-dimensional dataset.Emary et al. 36 proposed a binary variant of the ALO algorithm specifically designed for wrapper-based feature selection.They utilized a K-Nearest Neighbors (KNN) classifier and aimed to discover an ideal subset of features that maximizes classification performance.The proposed method was performed on 21 standard datasets concerning evaluation criteria.In 37 , a wrapper-based model was proposed using an adapted whale optimization algorithm (WOA) for intrusion detection.To overcome the issue of early convergence resulting in a local optimal solution, the authors hybrid WOA with operators of the genetic algorithm.The suggested method uses the SVM algorithm to find important features in network data to accurately identify intrusions.
Alazzam et al. 38 developed a feature selection method for IDS that employs the PIO for the selection process.The authors also proposed a novel model for binarizing a continuous PIO and compared it to the traditional ways.The developed model aims to enhance the performance of IDS by selecting the most important features from a high-dimensional dataset.Al-Yaseen et al. 39 proposed an optimized wrapper feature selection method to boost the efficiency and decrease the processing time of IDS.The method selects relevant features based on a differential evaluation algorithm and then assesses the features utilizing a classifier.Fatani et al. 40 developed new techniques for IDS feature extraction and selection using swarm intelligence algorithms.The authors designed a mechanism for extracting features with convolutional neural networks (CNN) and presented an alternative feature selection approach using the Aquila optimizer (AQU).The introduced approach aims to improve the effectiveness of IDS by identifying the best features from a high-dimensional dataset.
In 41 , the researchers developed an intrusion detection model that makes use of an enhanced Random Forest (RF) classifier and BMRF optimization employing an adaptive S-shape operation.The RF classifier is applied for feature evaluation and to construct a model for intrusion detection, while the BMRF method is applied to determine which features from intrusion detection datasets are most relevant and eliminate redundant and unnecessary ones.Otair et al. 42 proposed an enhanced GWO-based PSO for IDSs in wireless sensor networks.The proposed technique utilizes the GWO algorithm for feature selection and hybridizes it with PSO to incorporate the most advantageous data for every gray wolf position using the best value.The PSO algorithm preserves the individual's best position information to avoid the GWO from getting trapped in a local optimum.
One of the techniques that can be effectively used to detect network intrusion using a trained dataset of network attacks is machine learning.The data set extracted from the network can include various features such as network traffic, network resource usage, and user activities, which are used to describe an instance in the data set.

Equilibrium optimizer
Faramarzi et al. 30 introduced a novel metaheuristic algorithm in 2020, based on physics, and for each optimization issue, it predicts equilibrium states as the best solution using a model of dynamic mass balance on a control volume.The EO consists of an initial population of concentration vectors in the search space, where every vector depicts a possible fix and is treated as its position.The initial population is generated using the following formula to begin the optimization process: where the population's size is established by N, the size of the problem's dimensions is indicated by D, the lower bound by LB and the upper bound by UB, and the initial concentration vector of the ith individual candidate in the population is represented by C d i .The vector rand d i is in the range of [0,1].The EO algorithm converges to an equilibrium state, which represents the outcome of the optimization process.However, only equilibrium candidates are utilized to direct the individual in their search pattern; the final equilibrium balance remains unknown.The four top individuals identified in EO by their fitness scores make up the equilibrium candidates, which are meant to increase the capacity for exploration.To encourage better exploitation, the average of the top four individuals is also presented.The vector that contains these five equilibrium candidates is called the equilibrium pool, and it has the following definition.where, and where vector − → C eq,pool determines the equilibrium pool, − → C eq (1) , − → C eq (2) , − → C eq(3) , − → C eq(4) are the top four candidates identified thus far, and the average of the top four candidates is − → C eq(avg) .In each iteration, utilizing the same probability for random selection among potential solutions, the concentration of individuals is updated.Equation ( 5) is used to update the concentration vectors: (1) where C old and C new denote the present concentration and the new concentration vectors of individuals, respec- tively.In the equilibrium pool, one concentration vector is arbitrarily chosen C eq .Equation ( 6) is used to calculate the vector F, often determined as the exponential term: where λ is a random vector with d dimensions, in the range of 0-1.With each iteration increment is calculated as follows-where Iter is the present iteration and Max_Iter is the maximum iteration-the t parameter is lowered.
where the capacity to exploit is controlled by a 2 .The a 2 variable in the EO algorithm is set to 1.The value of the t 0 is calculated with Eq. ( 8) and it controls the exploration and exploitation, where r is identified with a random vector in the range of 0 and 1, sign − → r − 0.5 show the orientation of exploitation and exploration during the search process.
a 1 is constant number and controls the exploration capability and its value is 1.The final variant of the exponential is obtained by substituting Eq. ( 8) into Eq.( 6): One of the key factors in the EO influencing the exploitation capabilities is the generation rate G.The calculation for this parameter is as follows: where, where the random values in [0,1] are r 1 and r 2 .The GCP vector is to regulate the generation rate, while G 0 is the starting generation rate vector.GP is the generation probability that is employed to strike a balance between exploring and exploiting and is set with GP = 0.5.Figure 1 shows the flowchart of the EO algorithm.

Proposed algorithm
The standard equilibrium optimization (EO) algorithm suffers from two major problems: lack of population diversity and premature convergence.To overcome these issues, an enhanced version of EO is developed here with two main phases: At the first step, the Opposition-Based Learning (OBL) technique is employed to improve population diversity 43 .In OBL, for each candidate solution X i in the present population, an opposite solution X ' i is generated utilizing Eq. ( 13) where the upper and lower bounds of the search space are a i and b i , respectively.
The fittest solutions from the present population and the following generation are chosen from the opposing population.This helps in the exploration of the search space and avoids premature convergence.
Secondly, the Levy flight technique is applied to update the population.In Levy flight, the new solutions are generated by using a random walk process with a Levy distribution.The Levy distribution has an infinite variance and generates new solutions in large steps, which helps in the fast exploration process 44 .At each generation, a fraction of the best solutions (70%) is updated using Levy flight while the remaining solutions are updated using the traditional EO update equation.The Levy flight phase helps in escaping from local optima.New solutions are provided by utilizing the following random walk equation: Where C new(i) and C eq(i) are the new and old solutions, respectively.S is the step size and its value is adjusted by S = 1/t and decreases over time, where t is the iteration number.This will make the steps larger at first, but decrease over time.Levy(D) is a Levy distribution and it is calculated as: where v is a random value in a normal distribution.The Levy distribution has an infinite variance and generates new solutions in large steps, enabling fast exploration of the search space.The Pseudo-code of the introduced LOEO is shown in Algorithm 1.

Computational complexity
The computational complexity of the proposed LOEO algorithm is obtained in this subsection.Computational complexity affects the algorithm's effectiveness, and in the presented LOEO algorithm, an algorithm with less complexity has been tried.Consequently, the complexity of the proposed method is expressed by the Big-O notation.The four primary factors that affect complexity are initialization, iteration count, fitness function assessment, and particle concentration updates.O (1) is the problem definition, and O (N × D), where D is the problem dimensions and N is the number of particles, is the complexity of the initialization phase.There are T iterations in total.Each particle's function evaluation complexity is O(C), and it takes O (N × C) time to assess the population's fitness.It costs O (N) time to save memory.The complexity of the Opposition-Based Learning Operator is O (N × D), and the updating process takes O (N × D) time.Also, every iteration, the update process of particle's positions is performed for a number of population members, when M is the number of particles to which the Levy flight Operator is applied, has a complexity of O (M × D).As a result, the LOEO algorithm's overall temporal complexity is computed as follows.: Equilibrium pool, {Ceq(1), Ceq(2), Ceq(3), Ceq(4),Ceq(avg) } Accomplish memory saving (if it > 1) Assign t using Eq (7)  for each i particle Select a candidate from the Equilibrium pool randomly Generate random vectors of λ, r Construct F vector using Eq(9) Construct GCP vector using Eq(12) Construct G0 vector using Eq (11)  Construct G vector using Eq (10)  Update the concentrate using Eq(5) end for for each i particle (70% of the best solutions are updated in this step) Update the concentrate using Eq (14  www.nature.com/scientificreports/developed to select the best features from the datasets of three data consisting of NSL-KDD, UNSW-NB15, and CIC-IDS2017. The LOEO utilizes the variable threshold approach described in Eq. ( 17) to transform continuous solutions into binary representation in this section.The new binary position of the ith search individual is represented as b d i (t + 1) , where θ is a variable threshold set by the user to 0.5.
The problem of feature selection is an optimization issue that involves binary variables.Each solution in this problem can be shown as a vector with one dimension, where the length of the vector determines how many features are present in the dataset.Each feature in the vector can take one of two values: "0" shows that the matching feature is not chosen, whereas "1" indicates that it is.A sample feature selection vector containing D features is represented in Fig. 2.
The fitness function employed in this problem has two primary goals, as stated in Eq. ( 18): to reduce the number of features chosen and maximize accuracy.The optimal solution obtains the most accuracy for classifier model while selecting the fewest possible features.To evaluate solutions, a KNN classifier is employed 45 .A subset of features is chosen by the solution for each iteration, and the KNN classifier trains data using the chosen feature subset and determine accuracy.As a result, the objective function is obtained as follows: where E denotes the error rate of the KNN classifier, |F i | is the amount of the selected features in a subset of F i , and D determines the whole of the features.α and β are criteria to check the importance of accuracy and the number of features in the subset, respectively.In this paper, α = 0.99 and β = 0.01, based on 46 .

Experimental results
In this section, the effectiveness of the BLOEO algorithm in identifying the better feature subset is examined on three datasets, including NSL-KDD, UNSW-NB15, and CIC-IDS2017.The results of our experiments are compared with other algorithms including the Sine Cosine algorithm (SCA) 47 , GWO 48 , HHO 49 , Differential Evolution (DE) 50 , and Salp Swarm Algorithm (SSA) 51 .To assess and contrast the proposed BLOEO algorithm with alternative methods, each algorithm is independently executed 20 times on a PC equipped with an Intel® 6.0 GB RAM Core™ i5 2.40 GHz processor.Additionally, Windows 10's MATLAB 2019b platform is used to run the apps.

Datasets description
The NSL-KDD 52,53 , CICIDS2017 54,55 , and UNSW-NB15 56,57 datasets are often utilized for evaluating network IDSs (NIDS).An upgraded version of the KDD Cup99 dataset is the NSL-KDD dataset, with duplicate records removed and the data size reduced.Simulated attacks include Denial of Service (DoS) attacks, User-to-Root (U2R) attacks, Remote-to-Local (R2L) attacks, and probe attacks.The CICIDS2017 dataset includes simulated real-world network traffic data and is divided into normal and attack behaviors, with attacks classified: brute force FTP, brute force SSH, DoS, heartbleed, web, infiltration, botnet, and DDoS attacks.The UNSW-NB15 dataset was constructed utilizing the PerfectStorm tool to simulate nine distinct network attacks, such as DoS, ShellCode, Worms, Fuzzers, and Backdoors, among others.

Data preprocessing
In this section, the evaluated datasets are preprocessed in three main steps: data transformation, deletion of duplicate records, and data normalization 58 .
Data transformation: The data features consist of both numbers and strings.To apply the proposed method to the dataset, the string features need to be converted to numerical values.
Deletion of duplicate records: In the next step, duplicate records are removed from the dataset to prevent biasing the classifiers towards frequent records.At this stage, a large number of duplicate records are removed from the KDDCUP 99 dataset.The two datasets NSL-KDD and UNSW-NB15 have no duplicate records.Additionally, missing values are managed at this stage.www.nature.com/scientificreports/Data normalization: In the next step, data normalization is carried out.During the scaling process, the data values of each feature are placed in a proportional range.After scaling, the values of a feature are placed in the specified range [0, 1].Equation (19) formulates the data normalization process in the range [0, 1] 59 .
Finally, the feature selection process is applied to reduce the number of features of the dataset to increase the efficiency of classification.In this study, a wrapper-based feature selection method is proposed to reduce the number of dataset features using the proposed BLOEO algorithm.

Parameter settings
For all experiments conducted, the KNN classifier in all methods with k = 5 to categorize feature subsets is employed to determine the optimal subset of features.
There are two reasons to choose KNN over other classifiers.Firstly, KNN is a simple yet powerful algorithm that can capture both linear and non-linear relationships within the data.This makes it well-suited for the exploratory nature of the feature selection task.Secondly, KNN requires minimal hyperparameter tuning, which aligns to maintain a lightweight and efficient evaluation process during optimization.This combination of effectiveness and efficiency makes KNN an ideal choice as the classifier for guiding the feature selection algorithm toward the optimal subset of features.To train the KNN, each dataset is split into K-folds for crossvalidation purposes to assess the performance of the algorithms.To be more precise, the dataset is divided into K equal parts (K = 10) at random, K − 1 folds are utilized for training, while one-fold is reserved for the testing set.The algorithms are executed 20 times independently, using a uniform random distribution applied to create the starting population.Moreover, the maximum number of iterations and population size are set at 100 and 20, respectively, for all algorithms.When choosing the parameters for the KNN algorithm and each of the metaheuristic algorithms, a systematic adjustment process has been used to ensure computational feasibility.In this process, a balance is struck between model complexity and generalization performance on validation data, so algorithms can effectively explore the feature space without overfitting.Also, parameters were selected in optimization algorithms based on alignment with exploration and convergence characteristics.The algorithms' parameter settings are shown in Table 1.

Evaluation metrics
The proposed BLOEO and comparative algorithms are evaluated based on various performance metrics, including fitness, the number of selected features, precision, accuracy, sensitivity or recall, specificity, and F-Score.The criteria were chosen to align with the main goal of this paper, which is to select crucial features and accurately predict network attacks.These criteria assess and evaluate the algorithm's performance from various perspectives and strive to strike a balance between algorithm complexity and performance.The definitions of these measures are computed using Eqs.( 20)-( 24) 60,61 .The numbers TP and TN in these equations represent the number of positive and negative samples, respectively, that the classifier correctly classifies.The number of positive samples that a classifier wrongly classifies as negative is represented by FN, and the number of negative samples that a classifier incorrectly classifies as positive data is represented by FP.  www.nature.com/scientificreports/

Simulation results and discussion
The proposed binary LOEO algorithm's simulation results are reported in this subsection.On intrusion detection datasets.We analyze and discuss the findings by comparing the BLOEO model with state-of-the-art models.www.nature.com/scientificreports/

Comparison of algorithms on the NSL-KDD dataset
To assess the LOEO algorithm's effectiveness in solving the feature selection problem, every experiment was carried out on three datasets including NSL-KDD, CICIDS2017, and UNSW-NB15.In Table 2, the BLOEO algorithm is compared with other competing algorithms regarding accuracy, fitness, and number of selected features, and the results are presented.All algorithms were executed 20 times and their average was calculated and reported in Table 2.The results of experiments on the NSL-KDD dataset show that regarding accuracy and fitness, even though all algorithms have relatively good results, compared to other competing methods, the BLOEO performs better.The classification accuracy of the proposed BLOEO algorithm with a value of 0.958 on NSL-KDD data is better than other algorithms.Also, this algorithm is superior to competing algorithms by obtaining values of 0.042 and 14.3 for Fitness and the number of selected features, respectively.This issue can have a positive effect on the quality of the IDS.After BLOEO, the SCA algorithm is ranked second and has a relatively good performance in selecting rich features in intrusion detection.The convergence curve for the BLOEO and the other compared algorithms is exhibited in Fig. 3.This figure shows that all the algorithms are able to converge well to the optimal solution, but among the algorithms, the BLOEO algorithm is the best.Also, the BLOEO algorithm has been able to escape from local optima and converge to the global optimal solution, and compared to other algorithms, it has obtained the lowest fitness value.

Comparison of algorithms on the UNSW-NB15 dataset
The simulation results on the UNSW-NB15 dataset in Table 2 show the efficiency and the BLOEO algorithm's superiority over alternative methods.Numerical results regarding the accuracy criteria and the fitness of the compared algorithms show that the BLOEO algorithm can have an accurate and efficient diagnosis for possible attacks on the computer network with a good and significant difference in comparison to other algorithms.Figure 4 represents the convergence curve of metaheuristic algorithms along with the BLOEO algorithm.As represented in this figure, the BLOEO algorithm in the same initial iterations has been able to have a good convergence in reaching the optimal solution and obtain a relatively good convergence rate close to zero.In addition, by contrasting the number of selected features, we can understand that considering this criterion, BLOEO is almost equal to the SCA algorithm and is better than other algorithms.That is, it can obtain the desired classification accuracy by choosing fewer features.However, the comparison of metaheuristic algorithms with the proposed BLOEO algorithm is not limited to the criteria considered above.Experiments are also performed on the three introduced intrusion detection data sets regarding precision, sensitivity, specificity, and F-score.The numerical findings calculated from the tests are exhibited in Table 3.According to findings, the BLOEO algorithm has almost been able to perform better than others on all three data.

Comparison of algorithms on the CIC-IDS2017 dataset
The evaluation outcomes of various comparative algorithms and the BLOEO algorithm on the CICIDS2017 dataset are also represented in Table 2.In addition, Fig. 5 illustrates the convergence curve of all algorithms.According to this figure, The BLOEO algorithm has a good chance of escaping local optima and achieving convergence to the global best answer.Perhaps, if the number of iterations of the algorithm was more, it could still achieve better results by better searching the space.In general, the presented numerical results and the convergence diagram indicate that the BLOEO algorithm is more successful than other algorithms and has performed better in data classification and intrusion detection.Moreover, it has selected a few features from this data set and in this way creates an intrusion detection system with low complexity.www.nature.com/scientificreports/ To show how the BLOEO algorithm performs in comparison to other algorithms, Friedman's statistical test can be employed to order.Figure 6 exhibits the findings of Friedman's test to compare the efficiency of the proposed algorithm and other competitors regards the fitness value of the algorithms.According to this figure, the BLOEO algorithm has been able to get the first rank among competing algorithms and they differ greatly from one another.Therefore, by proving this issue, the BLOEO algorithm applies to other optimization issues, especially binary problems such as feature selection.
Apart from the algorithms that were compared in the preceding section, the BLOEO algorithm was also compared with four state-of-the-art algorithms presented in recent years.These algorithms include BHOA 25 , BIMEO 31 , and the research is done by Tama et al. 62 , Alazzam et al. 38 and Kareem et al. 63 .
Table 4 illustrates the evaluation findings of the compared algorithms on NSL-KDD, CICIDS2017, and UNSW-NB15.As observed, the BLOEO algorithm has the highest performance on the NSL-KDD dataset in terms of all the criteria considered in this table.In addition, BLOEO, BHOA, BIMEO, and GTO-BSA algorithm proposed by Kareem et al. are experimented on the CICIDS2017.The numerical results specified in Table 4 show that the accuracy, fitness and Specificity of the GTO-BSA are better than other algorithms with values of 0.987 and 0.013, respectively.Also, the Precision and Sensitivity of the BLOEO algorithm are better than other competitive algorithms.In order to compare the effectiveness of algorithms on the UNSW-NB15, the proposed algorithm has been contrasted with four introduced algorithms.The obtained numerical results show its superiority over competing algorithms.

Conclusion and future works
This paper proposed an enhanced variant of the EO algorithm called BLOEO to select effective features for IDSs.The BLOEO algorithm utilizes opposition-based learning to enhance population diversity and a Levy flight mechanism to prevent local optima.The OBL helped the population explore a wider search space and escape from local optima.The Levy flight mechanism further improved the exploratory ability of the algorithm.Overall, the BLOEO algorithm provides an effective method for feature selection that can enhance the efficiency and scalability of IDSs.Experimental results on three datasets demonstrate that the BLOEO algorithm can drastically cut feature count while retaining good accuracy.Directions for future research include applying the BLOEO algorithm to other feature selection problems and datasets to further evaluate its performance.

Figure 2 .
Figure 2. Solution representation for the feature selection problem.

Figure 4 .
Figure 4. Comparison of convergence curve of algorithms on the UNSW-NB15 dataset.

Figure 5 .
Figure 5.Comparison of convergence curve of algorithms on CICIDS2017 data.

Table 2 .
Comparison results of algorithms in terms of accuracy, Fitness, and the number of selected features.

Table 3 .
Comparison results of algorithms in terms of precision, sensitivity, specificity, and F-score.

Table 4 .
Comparison results of BLOEO algorithm with state-of-the-art algorithms.Significant values are in bold.